Independent two-sample t-test (t_test_independent)#

The independent (unpaired) two-sample t-test asks:

“Do these two independent groups plausibly come from populations with the same mean?”

It is used when you have:

  • a continuous outcome (interval/ratio scale)

  • two independent groups (different people/items; not paired / repeated measurements)

  • interest in a difference in means

This notebook covers:

  • Student’s t-test (assumes equal variances)

  • Welch’s t-test (does not assume equal variances; often the safer default)

Learning goals#

By the end you should be able to:

  • choose the right t-test variant (Student vs Welch)

  • write down \(H_0\) / \(H_1\) and pick one- vs two-sided alternatives

  • compute the t-statistic and degrees of freedom from scratch (NumPy)

  • understand what the p-value is (and what it is not)

  • interpret results using both p-values and confidence intervals

  • build intuition with simulations and Plotly visuals

import math
import platform
from dataclasses import dataclass
from typing import Literal

import numpy as np

import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)

print("Python", platform.python_version())
print("NumPy", np.__version__)

# SciPy is optional: we only use it for cross-checks in a later section.
try:
    import scipy
    from scipy.stats import t as scipy_t
    from scipy.stats import ttest_ind

    HAS_SCIPY = True
    print("SciPy", scipy.__version__)
except Exception as e:
    HAS_SCIPY = False
    print("SciPy not available:", repr(e))
Python 3.12.9
NumPy 1.26.2
SciPy 1.15.0

Prerequisites (quick)#

Math / stats:

  • sample mean and (unbiased) sample variance

  • the idea of a standard error (estimate of sampling uncertainty)

  • basic probability: “area under a curve”

Tools:

  • NumPy

  • Plotly (for visuals)

1) When to use the independent t-test#

Use it when:

  • you compare the means of a numeric variable between two independent groups

  • you can treat observations as independent within and across groups

  • the outcome is approximately normal within each group (or sample sizes are large enough for the CLT to help)

Do not use it when:

  • the same subject/item is measured twice → use a paired t-test

  • you have >2 groups → use ANOVA (or multiple-comparisons-corrected pairwise tests)

  • you care about medians / heavy-tailed distributions / strong outliers → consider Mann–Whitney U, a permutation test, or robust methods

2) Hypotheses and alternatives#

Let \(\mu_1\) and \(\mu_2\) be the population means for group 1 and group 2.

Most common (two-sided):

\[ H_0: \mu_1 - \mu_2 = 0 \qquad ext{vs}\qquad H_1: \mu_1 - \mu_2 e 0 \]

One-sided alternatives are also possible:

  • greater: \(H_1: \mu_1 - \mu_2 > 0\)

  • less: \(H_1: \mu_1 - \mu_2 < 0\)

Pick the alternative before looking at the data.

3) The test statistic (Student vs Welch)#

We observe samples:

  • group 1: \(x_1,\dots,x_{n_1}\)

  • group 2: \(y_1,\dots,y_{n_2}\)

Compute sample means:

\[ ar{x} = rac{1}{n_1}\sum_{i=1}^{n_1} x_i, \qquad ar{y} = rac{1}{n_2}\sum_{j=1}^{n_2} y_j \]

and unbiased sample variances:

\[ s_x^2 = rac{1}{n_1-1}\sum_{i=1}^{n_1}(x_i-ar{x})^2, \qquad s_y^2 = rac{1}{n_2-1}\sum_{j=1}^{n_2}(y_j-ar{y})^2 \]

The difference in sample means is:

\[ \Delta = ar{x} - ar{y}. \]

The t-test turns this into a signal-to-noise ratio:

\[ t = rac{\Delta}{\mathrm{SE}(\Delta)}, \]

where \(\mathrm{SE}(\Delta)\) is the standard error of the difference.

Student’s t-test (equal variances)#

Assumes \(\sigma_x^2 = \sigma_y^2\).

Pooled variance:

\[ s_p^2 = rac{(n_1-1)s_x^2 + (n_2-1)s_y^2}{n_1+n_2-2} \]

Standard error:

\[ \mathrm{SE}(\Delta) = \sqrt{s_p^2\left( rac{1}{n_1}+ rac{1}{n_2} ight)} \]

Degrees of freedom:

\[u = n_1 + n_2 - 2 \]

Welch’s t-test (unequal variances)#

Does not assume equal variances.

Standard error:

\[ \mathrm{SE}(\Delta) = \sqrt{ rac{s_x^2}{n_1}+ rac{s_y^2}{n_2}} \]

Approximate degrees of freedom (Welch–Satterthwaite):

\[u pprox rac{\left( rac{s_x^2}{n_1}+ rac{s_y^2}{n_2} ight)^2}{ rac{(s_x^2/n_1)^2}{n_1-1} + rac{(s_y^2/n_2)^2}{n_2-1} } \]

From \(t\) to a p-value#

Under \(H_0\) (and assumptions), the statistic follows a Student t distribution:

\[ t \sim t_{ u}. \]

The p-value is a tail probability under that reference distribution (two-sided example):

\[ p = P(|T| \ge |t_{ ext{obs}}| \mid H_0). \]

4) What the p-value means (precisely)#

A p-value is:

  • A probability computed under the null model.

  • The probability of getting a result at least as extreme as observed, if \(H_0\) were true.

A p-value is not:

  • the probability that \(H_0\) is true

  • the probability that the result happened “by chance” (without defining a model)

  • a measure of practical importance (effect size matters!)

5) NumPy-only implementation#

We’ll implement:

  1. summary stats (\(ar{x}\), \(s^2\))

  2. the t-statistic + degrees of freedom (Student or Welch)

  3. the Student-t PDF

  4. a numerical CDF via integration (needed for p-values and critical values)

Note: numerical integration is for learning; for production-grade accuracy/performance, prefer scipy.stats.

Alternative = Literal["two-sided", "less", "greater"]


def _as_1d_float(x, name: str) -> np.ndarray:
    arr = np.asarray(x, dtype=float).reshape(-1)
    if arr.size < 2:
        raise ValueError(f"{name} must have at least 2 observations")
    if np.any(~np.isfinite(arr)):
        raise ValueError(f"{name} contains non-finite values")
    return arr


@dataclass(frozen=True)
class TDistLookup:
    df: float
    t_max: float
    xs: np.ndarray
    cdf_pos: np.ndarray


def t_pdf(x: np.ndarray | float, df: float) -> np.ndarray:
    '''Student-t PDF (NumPy-only).'''
    x = np.asarray(x, dtype=float)
    df = float(df)
    if not np.isfinite(df) or df <= 0:
        raise ValueError("df must be finite and > 0")

    # pdf(x) = Gamma((df+1)/2) / (sqrt(df*pi) * Gamma(df/2)) * (1 + x^2/df)^(-(df+1)/2)
    log_c = (
        math.lgamma((df + 1.0) / 2.0)
        - 0.5 * (math.log(df) + math.log(math.pi))
        - math.lgamma(df / 2.0)
    )
    c = math.exp(log_c)
    return c * (1.0 + (x * x) / df) ** (-(df + 1.0) / 2.0)


def make_t_lookup(df: float, *, t_max: float = 12.0, n: int = 60_000) -> TDistLookup:
    '''Approximate the t CDF on [0, t_max] via cumulative trapezoid integration.'''
    df = float(df)
    t_max = float(t_max)
    n = int(n)

    if not np.isfinite(df) or df <= 0:
        raise ValueError("df must be finite and > 0")
    if not np.isfinite(t_max) or t_max <= 0:
        raise ValueError("t_max must be finite and > 0")
    if n < 1000:
        raise ValueError("n must be >= 1000 for a reasonable CDF approximation")

    xs = np.linspace(0.0, t_max, n + 1)
    pdf = t_pdf(xs, df)
    dx = float(xs[1] - xs[0])

    # Integral from 0..xs[i] (for i>=1)
    area = np.cumsum((pdf[:-1] + pdf[1:]) * 0.5 * dx)

    cdf_pos = np.empty_like(xs)
    cdf_pos[0] = 0.5
    cdf_pos[1:] = 0.5 + area
    cdf_pos = np.clip(cdf_pos, 0.0, 1.0)

    return TDistLookup(df=df, t_max=t_max, xs=xs, cdf_pos=cdf_pos)


def t_cdf(t: np.ndarray | float, lookup: TDistLookup) -> np.ndarray:
    '''Student-t CDF via interpolation from a precomputed lookup table.'''
    t = np.asarray(t, dtype=float)
    t_abs = np.abs(t)

    cdf_abs = np.interp(t_abs, lookup.xs, lookup.cdf_pos, left=0.5, right=1.0)
    return np.where(t >= 0, cdf_abs, 1.0 - cdf_abs)


def t_ppf(p: np.ndarray | float, lookup: TDistLookup) -> np.ndarray:
    '''Inverse CDF (quantile) via interpolation.'''
    p = np.asarray(p, dtype=float)
    if np.any((p <= 0.0) | (p >= 1.0)):
        raise ValueError("p must be in (0, 1)")

    sign = np.where(p >= 0.5, 1.0, -1.0)
    p_abs = np.where(p >= 0.5, p, 1.0 - p)

    t_abs = np.interp(p_abs, lookup.cdf_pos, lookup.xs)
    return sign * t_abs


def t_p_value(t_obs: float, lookup: TDistLookup, *, alternative: Alternative) -> float:
    '''p-value for a t-statistic under t(df).'''
    cdf = float(t_cdf(t_obs, lookup))
    if alternative == "two-sided":
        p = 2.0 * min(cdf, 1.0 - cdf)
    elif alternative == "greater":
        p = 1.0 - cdf
    elif alternative == "less":
        p = cdf
    else:
        raise ValueError("alternative must be one of: 'two-sided', 'less', 'greater'")

    return float(np.clip(p, 0.0, 1.0))


@dataclass(frozen=True)
class IndependentTTestResult:
    t: float
    df: float
    p_value: float
    alternative: Alternative
    equal_var: bool

    n1: int
    n2: int
    mean1: float
    mean2: float
    var1: float
    var2: float

    diff: float
    se: float
    ci: tuple[float, float]
    alpha: float

    cohens_d: float
    cohens_d_method: str


def t_test_independent_numpy(
    x,
    y,
    *,
    equal_var: bool = False,
    alternative: Alternative = "two-sided",
    alpha: float = 0.05,
    t_max: float = 12.0,
    lookup_n: int = 60_000,
) -> IndependentTTestResult:
    '''Independent two-sample t-test (Student or Welch), using NumPy-only p-values.'''
    x = _as_1d_float(x, "x")
    y = _as_1d_float(y, "y")

    n1, n2 = int(x.size), int(y.size)
    mean1, mean2 = float(x.mean()), float(y.mean())
    var1, var2 = float(x.var(ddof=1)), float(y.var(ddof=1))
    diff = mean1 - mean2

    if equal_var:
        df = float(n1 + n2 - 2)
        sp2 = ((n1 - 1) * var1 + (n2 - 1) * var2) / df
        se = math.sqrt(sp2 * (1.0 / n1 + 1.0 / n2))
        cohens_d = diff / math.sqrt(sp2)
        cohens_d_method = "pooled SD"
    else:
        se = math.sqrt(var1 / n1 + var2 / n2)
        df = float(
            (var1 / n1 + var2 / n2) ** 2
            / ((var1 / n1) ** 2 / (n1 - 1) + (var2 / n2) ** 2 / (n2 - 1))
        )
        # A common fallback is to standardize by the average SD (not perfect, but useful for scale).
        cohens_d = diff / math.sqrt(0.5 * (var1 + var2))
        cohens_d_method = "average SD"

    if se == 0.0:
        raise ValueError("Standard error is zero; check data (all values identical?)")

    t_stat = diff / se

    if not (0.0 < alpha < 1.0):
        raise ValueError("alpha must be in (0, 1)")

    lookup = make_t_lookup(df, t_max=t_max, n=lookup_n)
    p_value = t_p_value(t_stat, lookup, alternative=alternative)

    t_crit = float(t_ppf(1.0 - alpha / 2.0, lookup))  # always two-sided CI
    ci = (diff - t_crit * se, diff + t_crit * se)

    return IndependentTTestResult(
        t=float(t_stat),
        df=float(df),
        p_value=float(p_value),
        alternative=alternative,
        equal_var=bool(equal_var),
        n1=n1,
        n2=n2,
        mean1=mean1,
        mean2=mean2,
        var1=var1,
        var2=var2,
        diff=float(diff),
        se=float(se),
        ci=(float(ci[0]), float(ci[1])),
        alpha=float(alpha),
        cohens_d=float(cohens_d),
        cohens_d_method=cohens_d_method,
    )


def format_result(r: IndependentTTestResult) -> str:
    ci_level = int(round((1.0 - r.alpha) * 100))

    return "\n".join(
        [
            f"t = {r.t:.4f} | df = {r.df:.2f} | p = {r.p_value:.6g} ({r.alternative}, equal_var={r.equal_var})",
            f"mean1 = {r.mean1:.4f} (n={r.n1}), mean2 = {r.mean2:.4f} (n={r.n2})",
            f"diff = mean1 - mean2 = {r.diff:.4f}, SE(diff) = {r.se:.4f}",
            f"{ci_level}% CI for diff: [{r.ci[0]:.4f}, {r.ci[1]:.4f}]",
            f"Cohen's d ({r.cohens_d_method}): {r.cohens_d:.4f}",
        ]
    )

6) Worked example (synthetic data)#

We create two independent groups with a small mean difference.

Tip: In real analyses, build these arrays from your dataset after filtering into two groups.

n1, n2 = 25, 28

group1 = rng.normal(loc=0.0, scale=1.0, size=n1)
group2 = rng.normal(loc=0.5, scale=1.0, size=n2)

print("group1: mean=", float(group1.mean()), "sd=", float(group1.std(ddof=1)))
print("group2: mean=", float(group2.mean()), "sd=", float(group2.std(ddof=1)))
group1: mean= -0.035389987592057665 sd= 0.8310923822923664
group2: mean= 0.6753054649687451 sd= 0.7321879696862332
# Visualize the raw distributions

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=group1,
        name="group1",
        histnorm="probability density",
        opacity=0.6,
        marker_color="#1f77b4",
    )
)
fig.add_trace(
    go.Histogram(
        x=group2,
        name="group2",
        histnorm="probability density",
        opacity=0.6,
        marker_color="#d62728",
    )
)

m1 = float(group1.mean())
m2 = float(group2.mean())
fig.add_shape(type="line", x0=m1, x1=m1, y0=0, y1=1, yref="paper", line=dict(color="#1f77b4", dash="dash"))
fig.add_shape(type="line", x0=m2, x1=m2, y0=0, y1=1, yref="paper", line=dict(color="#d62728", dash="dash"))

fig.update_layout(
    title="Independent groups: histogram (density) with sample means",
    barmode="overlay",
    xaxis_title="Value",
    yaxis_title="Density",
)
fig.show()

fig2 = go.Figure()
fig2.add_trace(go.Violin(y=group1, name="group1", box_visible=True, meanline_visible=True, line_color="#1f77b4"))
fig2.add_trace(go.Violin(y=group2, name="group2", box_visible=True, meanline_visible=True, line_color="#d62728"))
fig2.update_layout(title="Violin + box plot", yaxis_title="Value")
fig2.show()
# Run the test (Student vs Welch)

res_student = t_test_independent_numpy(group1, group2, equal_var=True)
res_welch = t_test_independent_numpy(group1, group2, equal_var=False)

print("Student (equal variances):")
print(format_result(res_student))
print("\nWelch (unequal variances):")
print(format_result(res_welch))
Student (equal variances):
t = -3.3101 | df = 51.00 | p = 0.00171715 (two-sided, equal_var=True)
mean1 = -0.0354 (n=25), mean2 = 0.6753 (n=28)
diff = mean1 - mean2 = -0.7107, SE(diff) = 0.2147
95% CI for diff: [-1.1417, -0.2797]
Cohen's d (pooled SD): -0.9108

Welch (unequal variances):
t = -3.2861 | df = 48.21 | p = 0.00189875 (two-sided, equal_var=False)
mean1 = -0.0354 (n=25), mean2 = 0.6753 (n=28)
diff = mean1 - mean2 = -0.7107, SE(diff) = 0.2163
95% CI for diff: [-1.1455, -0.2759]
Cohen's d (average SD): -0.9074
def plot_two_sided_p_value_region(
    t_obs: float,
    df: float,
    p_value: float,
    *,
    x_max: float = 6.0,
    n: int = 2000,
) -> go.Figure:
    xs = np.linspace(-x_max, x_max, n)
    ys = t_pdf(xs, df)

    fig = go.Figure()
    fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name=f"t(df={df:.2f})"))

    t_abs = abs(float(t_obs))

    xr = np.linspace(t_abs, x_max, 800)
    yr = t_pdf(xr, df)

    # Right tail polygon
    fig.add_trace(
        go.Scatter(
            x=np.r_[xr, xr[::-1]],
            y=np.r_[yr, np.zeros_like(yr)],
            fill="toself",
            fillcolor="rgba(214, 39, 40, 0.25)",
            line=dict(color="rgba(214, 39, 40, 0.45)"),
            hoverinfo="skip",
            showlegend=False,
        )
    )

    # Left tail polygon (mirror)
    xl = -xr[::-1]  # from -x_max .. -t_abs
    yl = yr[::-1]
    fig.add_trace(
        go.Scatter(
            x=np.r_[xl, xl[::-1]],
            y=np.r_[yl, np.zeros_like(yl)],
            fill="toself",
            fillcolor="rgba(214, 39, 40, 0.25)",
            line=dict(color="rgba(214, 39, 40, 0.45)"),
            hoverinfo="skip",
            showlegend=False,
        )
    )

    y_top = float(np.max(ys))
    fig.add_shape(
        type="line",
        x0=t_obs,
        x1=t_obs,
        y0=0.0,
        y1=y_top,
        line=dict(color="black", dash="dash"),
    )

    fig.add_annotation(
        x=t_obs,
        y=y_top * 0.9,
        text=f"t = {t_obs:.3f}<br>p = {p_value:.3g}",
        showarrow=True,
        arrowhead=2,
        ax=40,
        ay=-40,
    )

    fig.update_layout(
        title="Two-sided p-value = shaded tail area under t(df)",
        xaxis_title="t",
        yaxis_title="Density",
    )
    return fig


plot_two_sided_p_value_region(res_welch.t, res_welch.df, res_welch.p_value, x_max=6.0).show()

7) How to interpret the output#

A practical checklist:

  1. Direction: the sign of t matches the sign of mean1 - mean2.

  2. Statistical significance: compare p to your threshold \(lpha\) (commonly 0.05).

  3. Confidence interval: a \((1-lpha)\) CI for \((\mu_1-\mu_2)\) that excludes 0 corresponds to rejecting \(H_0\) at level \(lpha\).

  4. Practical importance: consider effect size (e.g. Cohen’s d) and whether the CI is meaningfully far from 0.

  5. Assumptions: independence (design), distribution shape/outliers, variance structure.

for name, r in [("Student", res_student), ("Welch", res_welch)]:
    includes_zero = r.ci[0] <= 0.0 <= r.ci[1]
    print(f"{name}: CI includes 0? {includes_zero}")
Student: CI includes 0? False
Welch: CI includes 0? False

8) Assumptions, diagnostics, and pitfalls#

Independence (most important)#

The t-test assumes each observation is independent. This is usually a study design question (random sampling / random assignment).

Approximate normality (within each group)#

  • The t-test is fairly robust for moderate/large sample sizes.

  • With strong skew / heavy tails / outliers, the mean can behave badly.

Equal variances (only for Student’s t-test)#

If you are not comfortable assuming equal variances, use Welch’s t-test.

Outliers#

Because the test compares means, a single extreme value can move the mean and change the result.

# Outlier sensitivity demo: add a single extreme value to group2

outlier_value = 6.0
group2_outlier = np.concatenate([group2, [outlier_value]])

res_welch_outlier = t_test_independent_numpy(group1, group2_outlier, equal_var=False)

print("Welch (baseline):")
print(format_result(res_welch))
print("\nWelch (group2 with one added outlier):")
print(format_result(res_welch_outlier))

fig = make_subplots(rows=1, cols=2, subplot_titles=("Original data", "After adding an outlier to group2"))

fig.add_trace(
    go.Histogram(
        x=group1,
        name="group1",
        histnorm="probability density",
        opacity=0.6,
        marker_color="#1f77b4",
    ),
    row=1,
    col=1,
)
fig.add_trace(
    go.Histogram(
        x=group2,
        name="group2",
        histnorm="probability density",
        opacity=0.6,
        marker_color="#d62728",
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Histogram(
        x=group1,
        name="group1",
        histnorm="probability density",
        opacity=0.6,
        marker_color="#1f77b4",
        showlegend=False,
    ),
    row=1,
    col=2,
)
fig.add_trace(
    go.Histogram(
        x=group2_outlier,
        name="group2 (+ outlier)",
        histnorm="probability density",
        opacity=0.6,
        marker_color="#d62728",
    ),
    row=1,
    col=2,
)

fig.update_layout(
    title="Outliers can change means (and therefore the t-test)",
    barmode="overlay",
)
fig.update_xaxes(title_text="Value", row=1, col=1)
fig.update_xaxes(title_text="Value", row=1, col=2)
fig.update_yaxes(title_text="Density", row=1, col=1)
fig.update_yaxes(title_text="Density", row=1, col=2)
fig.show()
Welch (baseline):
t = -3.2861 | df = 48.21 | p = 0.00189875 (two-sided, equal_var=False)
mean1 = -0.0354 (n=25), mean2 = 0.6753 (n=28)
diff = mean1 - mean2 = -0.7107, SE(diff) = 0.2163
95% CI for diff: [-1.1455, -0.2759]
Cohen's d (average SD): -0.9074

Welch (group2 with one added outlier):
t = -3.1784 | df = 49.48 | p = 0.00255332 (two-sided, equal_var=False)
mean1 = -0.0354 (n=25), mean2 = 0.8589 (n=29)
diff = mean1 - mean2 = -0.8943, SE(diff) = 0.2814
95% CI for diff: [-1.4596, -0.3290]
Cohen's d (average SD): -0.8555

9) Simulation intuition: Type I error and power#

A good mental model:

  • Type I error (false positive rate): if \(H_0\) is true, how often do we reject at level \(lpha\)?

  • Power: if there is a real mean difference, how often do we reject?

Below we simulate many experiments and visualize:

  • the distribution of t-statistics under \(H_0\)

  • the distribution of p-values under \(H_0\) (should be roughly Uniform(0,1))

  • a simple power curve as the true mean difference grows

For speed we use Student’s t-test here (equal variances, fixed df).

def student_t_stats_matrix(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
    '''Vectorized Student t-statistics for many simulated experiments.

    X: shape (n_sims, n1)
    Y: shape (n_sims, n2)
    '''
    n1 = X.shape[1]
    n2 = Y.shape[1]

    m1 = X.mean(axis=1)
    m2 = Y.mean(axis=1)

    v1 = X.var(axis=1, ddof=1)
    v2 = Y.var(axis=1, ddof=1)

    df = n1 + n2 - 2
    sp2 = ((n1 - 1) * v1 + (n2 - 1) * v2) / df
    se = np.sqrt(sp2 * (1.0 / n1 + 1.0 / n2))

    return (m1 - m2) / se


alpha = 0.05
n_sims = 8000
n1 = n2 = 20

# Under H0: same mean, same variance
X0 = rng.normal(0.0, 1.0, size=(n_sims, n1))
Y0 = rng.normal(0.0, 1.0, size=(n_sims, n2))

df0 = n1 + n2 - 2
lookup0 = make_t_lookup(df0, t_max=12.0, n=80_000)

t_stats0 = student_t_stats_matrix(X0, Y0)
cdf0 = t_cdf(t_stats0, lookup0)
p_vals0 = 2.0 * np.minimum(cdf0, 1.0 - cdf0)

print("Empirical Type I error at alpha=0.05:", float(np.mean(p_vals0 < alpha)))

# Plot p-values under H0 (should be ~Uniform(0,1))
fig = go.Figure()
fig.add_trace(go.Histogram(x=p_vals0, nbinsx=40, histnorm="probability density", name="p-values"))
fig.add_trace(go.Scatter(x=[0, 1], y=[1, 1], mode="lines", name="Uniform(0,1) density=1"))
fig.update_layout(
    title="p-values under H0 (should be roughly uniform)",
    xaxis_title="p-value",
    yaxis_title="Density",
)
fig.show()

# Plot t-statistics under H0 with theoretical t pdf
x_line = np.linspace(-6, 6, 1400)
pdf_line = t_pdf(x_line, df0)

fig = go.Figure()
fig.add_trace(go.Histogram(x=t_stats0, nbinsx=60, histnorm="probability density", name="simulated t"))
fig.add_trace(go.Scatter(x=x_line, y=pdf_line, mode="lines", name=f"t pdf (df={df0})"))
fig.update_layout(
    title="t-statistics under H0 (simulation vs theoretical t density)",
    xaxis_title="t",
    yaxis_title="Density",
)
fig.show()
Empirical Type I error at alpha=0.05: 0.04875
# A simple power curve: how power increases as the true mean difference grows

deltas = np.linspace(0.0, 1.2, 9)  # true |mu1 - mu2| when sigma=1
powers = []

for delta in deltas:
    X = rng.normal(0.0, 1.0, size=(n_sims, n1))
    Y = rng.normal(delta, 1.0, size=(n_sims, n2))

    t_stats = student_t_stats_matrix(X, Y)
    cdf = t_cdf(t_stats, lookup0)
    p_vals = 2.0 * np.minimum(cdf, 1.0 - cdf)
    powers.append(float(np.mean(p_vals < alpha)))

fig = go.Figure()
fig.add_trace(go.Scatter(x=deltas, y=powers, mode="lines+markers"))
fig.update_layout(
    title="Power vs true mean difference (n1=n2=20, sigma=1, alpha=0.05)",
    xaxis_title="True |mu1 - mu2|",
    yaxis_title="Power (P(reject H0))",
)
fig.show()

10) Practical usage (SciPy cross-check)#

In real work, you typically use scipy.stats.ttest_ind:

  • equal_var=True → Student’s t-test

  • equal_var=False → Welch’s t-test

Below we compare SciPy’s result to our NumPy-only implementation.

if HAS_SCIPY:
    r_student = ttest_ind(group1, group2, equal_var=True)
    r_welch = ttest_ind(group1, group2, equal_var=False)

    print("SciPy Student:")
    print("  t=", float(r_student.statistic), "p=", float(r_student.pvalue))
    print("Our Student:")
    print("  t=", res_student.t, "p=", res_student.p_value)

    print("\nSciPy Welch:")
    print("  t=", float(r_welch.statistic), "p=", float(r_welch.pvalue))
    print("Our Welch:")
    print("  t=", res_welch.t, "p=", res_welch.p_value)
else:
    print("SciPy not installed; skipping cross-check.")
SciPy Student:
  t= -3.3100619557330977 p= 0.00171714967615853
Our Student:
  t= -3.3100619557330977 p= 0.0017171497826875548

SciPy Welch:
  t= -3.286069304555434 p= 0.0018987486385477889
Our Welch:
  t= -3.2860693045554346 p= 0.001898748756941293

Exercises#

  1. Change the sample sizes (n1, n2) and see how the standard error and power change.

  2. Try a clearly non-normal distribution (e.g. exponential) and compare the t-test to a permutation test.

  3. Simulate unequal variances and compare Student vs Welch rejection rates.

  4. Compute and visualize a bootstrap CI for \((\mu_1-\mu_2)\) and compare it to the t-based CI.

References#

  • Student (1908), The probable error of a mean

  • Welch (1947), The generalization of “Student’s” problem when several different population variances are involved

  • SciPy docs: scipy.stats.ttest_ind